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Abstract 

Malware is software with malicious intent. Besides viruses and worms, spyware, adware, and other 
newer forms of malware have recently emerged as widely-spread threats to system security. It is difficult 
to detect malware reliably because new and polymorphic ones appear frequently. It is also difficult to 
remove malware and repair its damage to the system because some malware programs can extensively 
modify a system. 

We propose a novel framework for automatically removing malware and repairing its damage to a 
system. The primary goal of our framework is to preserve system integrity. Our framework monitors 
and logs untrusted programs' operations. Using these logs, it can completely remove malware programs 
and their effects on the system, and reliably restore the infected data. Our framework does not require 
signatures or other prior knowledge of malware behavior. We implemented this framework on Windows 
and evaluated it with seven spyware, trojan horses, and email worms. Comparing our tool with two 
popular commercial anti-malware tools, we found that our tool detected all the malware's modifications 
to the system detected by the commercial tools, but the commercial tools overlooked up to 97% of the 
modifications detected by our tool. The runtime overhead and log space overhead of our prototype tool 
are acceptable. Our experience suggests that this framework offers an effective new defense against 
malware. 

1 Introduction 

Malware is software with malicious intent. Besides traditional malware such as viruses and worms, 
newer forms of malware, such as spyware and adware, have emerged rapidly and spread widely. A recent 
study showed that a significant number of computers running Windows in a major research university were 
infected with one or more malware programs [20]. Another recent study showed that one in three computers 
has malicious code on them [16]. 

The most common defense against malware is detection. However, since most detectors search for 
malicious code patterns (also known as signatures) of known malware, they cannot reliably detect new 
malware or variants of known malware (also known as polymorphic malware). Moreover, some useful 
programs are piggybacked with malicious code. For example, many P2P applications carry code that will 
install adware or spyware that is very difficult to remove [9]. As these malware programs accumulate, the 
computer often becomes unusable due to slow response time, exhausted storage, and frequent application 
crashes. In short, even good malware detectors cannot protect the user from running malware programs. 

Given that the user cannot avoid running malware on his system, the next defense is to remove the 
malware once the user notices its adverse effect on his computer. Typically, removing a malware program 
involves removing all the components (such as files and registry entries) installed by this program and 
restoring all the data modified or deleted by this program. Common approaches for removing a malware 
program include: 
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• Running an anti-malware program to remove all the components of the malware. However, because 
its reliance on known malware signatures, this approach cannot reliably remove new or polymorphic 
malware, nor can it restore infected data. 

• Taking periodic snapshots of the system, and restoring the infected system to the last clean snapshot. 
However, this approach will destroy all the new data created after the snapshot, even if they are clean. 
Although the user may avoid this problem by saving the new data that are clean, manually determining 
which data are clean is laborious and unreliable. 

• Formating the disk and reinstalling the operating system. This drastic approach will destroy all the 
user data and configurations, and therefore should only be used as the last resort. Unfortunately, since 
most other approaches often fail to remove all the components of the malware program reliably, this 
approach is often advised and followed. 

These problems call for a better approach, one that can remove all the components of both known 
and unknown malware, that can restore data infected by malware while preserving clean data, and that 
requires no user intervention. We introduce Back to the Future, a framework for achieving these goals. 
The framework monitors and logs operations of untrusted programs designated by the user, and can remove 
all the components of the untrusted programs and restore the infected data at the user's request. In other 
words, this framework allows the user to run untrusted programs without compromising the integrity of the 
system. If an untrusted program turns out to be spyware, the framework can remove all the components 
of the malware automatically and reliably; if the untrusted program turns out to be a virus, the framework 
can also restore all the infected files automatically. We name this framework Back to the Future because 
conceptually we have first rolled back the system to a prior good state (before the malware ran). From there, 
we then bring only the trusted processes back to their pre-recovery state (for the prior good state this is the 
future). 

The primary security goal of our framework is integrity: We want to preserve the integrity of the system 
while the user is running malware programs. In some cases, our framework can also provide availability: 
by completely removing malware from the system, it will free the resources usurped by the malware. Our 
framework does not aim at providing confidentiality. However, if the user can indicate confidential infor- 
mation on his system, our framework can incorporate this information and provide confidentiality. Without 
this information, once the user begins to run a malware, it is very difficult to preserve confidentiality. Never- 
theless, our approach may help protect confidentiality in some cases: By stopping running malware before 
it violates system integrity, our approach can prevent the leak of confidential information that may hap- 
pen afterward. Our framework may seem similar to sandboxing environments; however, unlike a typical 
sandboxing environment, our framework does not require system or application specific rules about what 
operations are allowed (see Section 6 for further discussions). 

Our framework monitors untrusted processes, and removes them and their effects on the system auto- 
matically at the user's request. However, our framework needs the user to decide which programs are trusted 
and which are untrusted. On the surface, this requirement seems as difficult as malware detection, but in 
fact, our framework only expects the user to evaluate the trustworthiness of a program conservatively: When 
in doubt, the user should consider the program as untrusted. In practice, there are often sound heuristics for 
deciding if a program is trusted. It is often reasonable to consider programs from reliable sources as trusted, 
such as all pre-installed applications on a new computer from a reputable vendor. There is no harm in mis- 
classifying an non-malware program as untrusted, except for incurring some performance penalty. (We will 
discuss performance issues in Section 4.) 

Our framework needs to solve two challenges. First, it needs to prevent trusted programs from being 
contaminated by untrusted programs, i.e., when a trusted program reads data from an untrusted program. 
Second, it needs to define a semantics for system recovery, i.e., removing malware programs and their 
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effects. Intuitively, after recovery, the system should look as if only the trusted applications have run, and 
the untrusted applications have never been installed or run. We will describe our solutions to both these 
challenges in Section 2. 

We summarize the major contributions of our paper: 

• We propose a new framework for preserving system integrity while allowing the user to run untrusted 
programs. The framework monitors and logs untrusted programs' operations, and uses these logs for 
removing the untrusted programs and their effects completely and automatically. Since this framework 
does not need any prior knowledge about the untrusted program, it can defend against both known 
and unknown malware. 

• Our framework permits an untrusted program run as long as possible until it may interfere with a 
trusted program. When this happens, our framework offers the user flexible choices: The user may 
terminate the untrusted program and undo its effects on the system, terminate the affected trusted 
program, or let the untrusted program continue but reclassify the affected trusted program as untrusted. 

• Our framework provides a transparent environment for running both trusted and untrusted programs. 
The user does not need to modify any existing programs. No program should notice that it is running 
in our framework. 

• We have implemented a prototype of our framework on Windows, where the threat of malware is 
greatest, and evaluated it with seven spyware, trojan horses, and email worms. Comparing our tool 
with two popular commercial anti-malware tools, we found that our tool detected all the malware's 
modifications to the system detected by the commercial tools, but the commercial tools overlooked 
up to 97% of the modifications detected by our tool. The runtime overhead and log space overhead of 
our prototype tool are acceptable. 

The rest of this paper is organized as follows. Section 2 describes the framework and focuses on system 
integrity and system recovery. Section 3 describes how we implemented this framework on Windows, 
and Section 4 evaluates this framework with respect to its effectiveness and performance. In Section 5, 
we discuss the security and correctness of our prototype. Section 6 reviews related work, and Section 7 
concludes with a discussion of future work. 

2 Framework 

2.1 Overview 

Figure 1 illustrates the three components of Back to the Future: a monitor, a logger, and a recovery 
agent. The monitor interposes between processes and the operating system for intercepting all the read and 
write operations of the processes. The logger records some write operations of the untrusted processes. 
When the monitor determines that an untrusted process may harm a trusted process, it invokes the recovery 
agent to restore system integrity. 

Our framework needs to solve two challenges. First, how does it determine when an untrusted program 
may violate the integrity of the system? Second, how does it remove all the effects of an untrusted program? 
The next two sections describe our solutions to these two challenges. 

2.2 System Integrity 

This section defines the notion of system integrity, describes a criterion for checking when an untrusted 
program may violate system integrity, and discusses how to preserve system integrity. 



3 



Trusted Process 



Untrusted Process 



















Monitor 






Operating 
System 















-* 











— 















/ \ 









Logger 




Recovery 
Agent 


— 



Figure 1 : Framework for monitoring, logging and recovery. 



2.2.1 Integrity Model 

We start with Biba's integrity model [3], which says that no subject can read objects of lower integrity 
levels, and no subject can write objects of higher integrity levels. Our framework defines two integrity 
levels: trusted and untrusted. Applying Biba's model, our framework would require that trusted processes 
should not read untrusted data, and untrusted processes should not write trusted data. 

Strictly following Biba's model, however, considerably limits the user's ability to run untrusted pro- 
grams. For example, the framework would have to stop an untrusted process immediately when the process 
tries to overwrite trusted data. If no trusted process will ever read this data again (e.g., this data is a tem- 
porary scratch), stopping the untrusted process is unnecessary. Even if some trusted process will read this 
data, the framework does not have to intervene until just before the read operation happens. 

Hence, we adopt a relaxed integrity model. Our model only requires that no trusted process should read 
untrusted data, but untrusted processes can freely write (or overwrite) any data they desire. This model can 
be viewed as a lazy Biba's model: It does not enforce integrity until the point where untrusted data will flow 
into trusted processes. The laziness in our model allows the user to run more untrusted applications and run 
them for longer periods of time. 

2.2.2 Preserving System Integrity 

To preserve system integrity, the framework must intervene when a trusted process is about to read 
untrusted data. We argue that a good intervening approach should satisfy the following properties: 

• Preserving the consistency of processes: The approach should preserve the consistency of both trusted 
and untrusted processes. This means that if the approach allows a process to continue, it should 
not change the process's behavior. For example, the approach should not selectively deny certain 
operations of the process. 

• Allowing processes to run as long as possible: The approach should allow a process to run as long as 
possible until it cannot preserve system integrity or the consistency of some processes. 

We propose the following options for preserving system integrity. 

• Deny the operation: This option preserves system integrity by denying this read operation. To preserve 
the consistency of the trusted process that issued the read operation, the framework must also terminate 
the trusted process. 
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• Allow the operation: We can allow this read operation but still preserve system integrity with the 
following approaches: 

- Terminate the untrusted process: We terminate the untrusted process that has written the un- 
trusted data, and restore the old data at the same location. If the restored data d is still untrusted, 
we terminate the process that has written d, restore the old data, and keep this procedure until 
the data that we have restored is trusted. This solution preserves system integrity by replacing 
untrusted data with trusted data. 

- Mark the trusted process as untrusted: We begin to treat the trusted process as untrusted. Notice, 
however, that we do not need to remove the data written by this process in the past. Since this 
process is now untrusted, we allow the read operation to continue, as untrusted processes can 
read any data. This solution preserves system integrity by reducing the set of trusted processes. 

Sometimes we may want to preserve good effects of untrusted applications. Under such cases, we can 
mark the untrusted data as trusted and let the read operation continue. As an example, consider media 
files downloaded by malware-laden P2P applications. If the user is confident that the media files will not 
affect trusted applications, we can allow a trusted application, say a media player, to play these media files. 
However, if the user is mistaken, system integrity may be violated. 

2.3 System Recovery 

This section describes how the framework removes all the effects of untrusted programs on the system. 

2.3.1 Basic Approach for System Recovery 

We first describe a basic approach for system recovery that is conceptually simple and serves as a refer- 
ence for reasoning about the correctness of a more efficient but complex approach. During monitoring, the 
framework logs all the operations of both trusted and untrusted processes; during recovery, the framework 
first undoes all the logged operations of both trusted and untrusted processes reverse-chronologically, and 
then redoes all the logged operations of only the trusted processes chronologically. 

We next elaborate on this approach. Given a definition of the state of a system (e.g., the state consists of 
the file system and the registry), we can divide the operations of all the processes into two categories: read 
operations (which do not change the system state) and write operations (which do change the system state). 
Since our goal is to remove all the effects of the untrusted processes on the system, the framework needs 
to log only the write operations. This approach requires the framework to log the write operations of both 
trusted and untrusted processes. Moreover, since the framework needs to undo the write operations during 
the recovery phase, it needs to log the old data overwritten by each write operation during the monitoring 
phase. 

One can argue that after recovery this basic approach brings the system to a state that looks as if the 
untrusted processes have never run. However, this approach is inefficient, because during recovery it first 
undoes each write operation by the trusted processes and later redoes the same operation. For most write 
operations, undoing them followed by redoing them will have no net effect. We could save time by avoiding 
undoing and redoing these write operations, and save space by not logging these write operations. 

2.3.2 Refined Approach for System Recovery 

We refine the basic approach by avoiding undoing and redoing the write operations such that undoing 
them followed by redoing them has no net effect. We motivate the refined approach by two examples, where 
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a trusted process T and an untrusted process U write to the same data location, but in different orders in the 
two examples: 

• Example 1: T writes before U writes. During recovery, the framework only needs to undo t/'s write 
operation; it does not need to undo and then to redo T's write operation, and it does not need to log 
this operation during monitoring. 

• Example 2. U writes before T writes. During recovery, the framework does not need to undo either 
t/'s write or 7"s write, because 7"s trusted data has overwritten t/'s untrusted data. 

These two examples suggest that we can detect unnecessary undoing and redoing by tracking the order 
in which trusted processes and untrusted processes write to the same data location. In fact, it suffices to track 
whether each location contains trusted or untrusted data. In this refined approach, during monitoring: 

• When a trusted process writes to a data location, mark the new data in the location as trusted. 

• When an untrusted process writes to a location: 

- If the location contains trusted data, log this write operation, save the old data, and mark the new 
data in this location as untrusted. 

- If the location contains untrusted data, do nothing. 

- If the location contains no data, log this write operation, and mark the new data in this location 
as untrusted. 

During recovery, the framework examines each logged write operation reverse-chronologically. Recall 
that the framework only logs write operations by untrusted processes. For each logged write operation: 

• If its location currently contains untrusted data, restore the old data at this operation from the log. 

• If the location contains trusted data, do nothing. 

Proof of Correctness We prove that this refined approach achieves the same result as the basic approach. 
Given a data location, let the entire sequence of write operations at this location before system recovery 
be 0i, ... , O n . We consider two cases, depending on whether the last operation O n is from a trusted or an 
untrusted process: 

• Case 1: The last write operation O n is from a trusted process. Using the basic approach, the framework 
will first undo O n , . . . , 0i, in this particular order, and then redo only the operations in 0\, . . . , 0„ that 
are from trusted processes, in that order. Since O n is from a trusted process and is the last operation 
performed during recovery, this location will contain the data written by O n after recovery. Using the 
refined approach, the framework will notice that the location already contains trusted data, so it will 
do nothing on this location. Since before recovery this location already contains data written by 0„, 
after recovery using the refined approach, this location will contain the same data as when using the 
basic approach. 

• Case 2: The last write operation O n is from an untrusted process. Let O t be the last write opera- 
tion by a trusted process in this sequence. Now the sequence is 0\,...,O t , O t +\,. ..,O n where all 
O t+ i, . . . , O n are from untrusted processes. Using the basic approach, the framework will first undo 
O n , . . . , O t +i, O t ,...,0\, and then redo only the operations in 0\,...,O t that are from trusted pro- 
cesses. Since O t is from a trusted process, it will be the last operation that the framework redoes on 
this location, so this location will contain the data written by O t after recovery. Using the refined 
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approach, during monitoring the framework will log O t+ i, but will not log any operation after O t+ i, 
because O t+ \ writes untrusted data into this location. During recovery, the framework will first undo 
O t+ \ by replacing the data in this location with the data that was in this location before O t +\, which 
was exactly the data written by O t . After that, this location contains trusted data because O t is from a 
trusted process, so the framework will not change the data in this location any more. Therefore, both 
the basic and the refined approach restore the same data into this location. 

3 Implementation 

To evaluate our framework, we have developed a prototype implementation for the Windows XP oper- 
ating system. The implementation consists of the three essential components of the framework: a monitor, 
logger, and recovery agent. Our monitor is a Windows kernel driver that hooks relevant system services and 
can therefore capture most of the interactions between user processes and the operating system. The logger 
and recovery agent are user applications that interact with the driver. 

3.1 Monitoring 

3.1.1 System Service Hooking 

In Windows NT 4, 2000, and XP, user applications rely on the interface exposed from a set of libraries, 
such as kernel32.dll and user32.dll, to access operating system services. This interface is known as the 
Win32 API. Since the functions in this interface play a similar role as system calls in Unix, we will use the 
terms system services and system calls interchangeably. These functions call lower level system services 
to complete the services eventually. For example, if a user application intends to set a registry value, it 
would generally call the Win32 RegSetValue function provided by the library advapi32.dll. This function 
is a wrapper for an underlying call to NtSetValueKey, which is implemented in ntdll.dll. Functions whose 
names start with Nt or Zw are known as the Native API. Since Microsoft only documents a small percentage 
of the Native API, applications generally do not call these functions, although they can [15]. 

When the kernel traps system call interrupts, it uses a unique identifier found in the call to look up a 
function pointer in the service dispatch table. Kernel drivers can modify this table to wrap system calls with 
arbitrary code. This technique, known as API hooking, allows us to intercept all the system calls made by 
any process [14, 22]. 

Although system call hooks allow the framework to monitor all system calls, the framework needs to 
monitor only the system calls that access the file system and the registry, and that create new processes. 
System calls that access the file system include ZwCreateFile, ZwOpenFile, ZwDeleteFile, ZwQuerylnfor- 
mationFile, ZwSetlnformationFile, ReadFile, and WriteFile. System calls that access the registry include 
ZwCreateKey, ZwOpenKey, ZwDeleteKey, ZwSetValueKey, ZwDeleteValueKey, ZwRestoreKey, ZwQuery- 
ValueKey, and ZwSetlnformationFile. System calls that create new processes include ZwCreateProcess and 
ZwCreateProcessEx. 

3.1.2 Tracking Untrusted Data 

A significant component of the monitor tracks which data are untrusted as both trusted and untrusted 
processes proceed, because our integrity model requires that no trusted process should read untrusted data. 
In the implementation of this component, two key issues are granularity and metadata: At which granularity 
does this component track untrusted data, and how to record the trustworthiness information in metadata? 

To determine the best granularity, we need to strike a balance between precision and overhead, as smaller 
granularities improve precision but increase overhead. We implemented different granularities for registry 



7 



Process 


Process's Operation 


Old Status of Target Data 


Monitor's Action 




Delete file 


Trusted 






Untrusted 


Remove file from watch list 




Write data 


Trusted 




Trusted 


Untrusted 


VI, nl new data as trusted 




Read data 


Trusted 






Untrusted 


Warn integrity violation 




Create process 


Any 


_ 




Delete file 


Trusted 


Mark file as deleted 




Untrusted 


Mark file as deleted 




Write data 


Trusted 


Mark new data as untrusted 


Untrusted 


Untrusted 






Read data 


Trusted 






Untrusted 






Create process 


Any 


Monitor new process as untrusted 



Table 1 : Tracking untrusted data and new processes. 



values and files. The granularity of registry is one registry value, because most registry values are small. On 
the other hand, files can become very large, so using a file-level granularity would be too coarse-grained. 
Thus, we track the trustworthiness of each byte in files by recording the ranges of untrusted data in each file. 

Our implementation maintains a table of all files and registry entries that contain untrusted values. For 
each file, an associated data structure describes which ranges in this file contain untrusted data. The monitor 
uses this table to determine if a trusted process will read untrusted data. The logger (Section 3.2) and the 
recovery agent (Section 3.3) will also use this table. Table 1 summarizes the actions taken by the monitor 
for various operations. 

In addition to tracking untrusted data, the monitor also tracks and monitors processes spawned by un- 
trusted processes, which are also considered as untrusted. 

3.2 Logging 

The second component of the implementation is logging. During recovery, the framework uses logged 
information for removing malware programs and to restore infected data on the system. As discussed in 
Section 2.3, the framework only needs to log write operations from untrusted processes. 

Our logging mechanism is located in user space. The monitor makes appropriate backups and forwards 
information to the logger. An alternate approach could have been to manage log files directly from the 
monitor (and, therefore, from a kernel driver). We decided that minimizing the footprint of the kernel 
portion of our framework was important. In addition, keeping the logging outside of the kernel allows 
potential future upgrades to logging mechanisms (i.e., compression, XML, etc.). 

3.3 Recovery 

The final portion of an implementation is recovery. Given the data created by the logging mechanism, 
the recovery tool will roll back the effects of each entry until the desired system state is reached. The tool 
also uses trustworthiness information about data from the monitor to determine what portions of data it 
should restore. Similar to the logging tool, the recovery tool runs in user space. 
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4 Experiments 



We evaluated our tool's effectiveness in detecting malware, removing malware, and restoring infected 
data, and its performance during monitoring and recovery. We tested our tool on a suite of malware programs 
consisting of: 

• Adware and spyware: eZula, Gator, and BonziBuddy. They are normally bundled with other benign 
programs, such as a P2P application. When the user installs the benign programs, the installers 
furtively installs these malware programs. 

• Trojan horse: NetBus. Trojan horses are normally packaged with innocuous decoy programs. When 
the decoy programs are executed, they install and run the bundled Trojan horses. NetBus configures 
the system to allow remote access and control. 

• Email worms: Netsky and Beagle. Email worms depend on deceived users to execute email attach- 
ments to install and propagate the worms. Netsky and Beagle caused two major email worm outbreaks 
in 2004. 

• Hybrid malware: Happy99. Happy99 acts both as a trojan horse and a worm, since it purports to be 
an entertaining screen saver, and it propagates via email behind the scenes. 

4.1 Recovery 

During recovery, our tool should remove all the files and registry entries installed by the malware, and 
restore the original data in the infected files and registry entries. We evaluated the effectiveness of our tools's 
recovery function by comparing it with two popular commercial tools: Spybot[2] and Symantec Norton 
AntiVirus[l]. Spybot handles eZula, Gator, and BonziBuddy, and Symantec Norton Anti-Virus handles the 
rest of the malware programs used in our experiments. We compared them in two experiments: 

• First experiment: after running a malware program, we first invoke the recovery function of our tool, 
and then we run a commercial tool to detect any residual traces of this malware. 

• Second experiment: after running a malware program, we first run a commercial tool to detect and 
remove the program, and then we examine whether the commercial tool has removed all the files and 
registry entries created by the malware program as logged by our tool. 

In the first experiment, for each malware program, we found that neither commercial tool could detect 
the malware after we ran our tool to remove it. Since both commercial tools could detect the malware before 
we removed it using our tool, we conclude that our tool has removed the malware to the satisfaction of 
the commercial tools. In the second experiment, we found that the commercial tools failed to remove all 
the files and registry entries that the malware programs had created. Table 2 compares the number of files 
and registry entries modified by the malware programs that were detected by our tool with those that were 
detected by the commercial tools. The table shows that, for some malware, our tool can identify more files 
and registry entries modified by the malware than commercial tools can. These include: 

• Original files and registry keys that malware has deleted from the system. E.g., W32.Netsky deleted 
a registry key associated with a component of Microsoft Internet Explorer. 

• Temporary files created by malware during its installation. E.g., eZula created temporary files while 
it was retrieving data from the network. These files were not deleted even after the commercial tool 
claimed to have removed eZula. 
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Malware 


Our Tool 


Commercial Tool 


Detected Modifications 


Detected Modifications 


False Negative 


File 


Registry Key 


File 


Registry Key 


File 


Registry Key 


eZula 


242 


195 


42 


61 


83% 


69% 


Gator 


385 


129 


151 


4 


61% 


97% 


BonziBuddy 


112 


2135 


24 


59 


79% 


97% 


NetBus 


2 


1 


2 


1 


0% 


0% 


Happy99.Worm 


2 


0 


2 


0 


0% 


0% 


W32.Beagle.AC 


44 


1 


44 


1 


0% 


0% 


W32.Netsky 


336 


8 


330 


1 


2% 


88% 



Table 2: Comparison of our tool and commercial tools' ability to detect files and registry keys modified by 
malware 

• Modifications made by other system components on behalf of the malware. E.g., Bonzi Buddy asked 
Microsoft Agent Services to modify the file system, but the commercial tool failed to detect the 
modified files. 

4.2 Usability 

Our tool monitors read and write operations of both trusted and untrusted processes. When a trusted 
process reads data that were written by an untrusted process, our tool will stop the process and alert the user. 
If this alert never happens, our tool will allow an untrusted process to run to completion (the user can still 
use our tool to remove the program and its effects at any later time). However, if this alert happens often, 
the usability of our tool will suffer, because each alert requires user intervention. 

We never saw an alert when we used our tool to run the seven malware programs mentioned earlier. 
Examining of the logs carefully, we found that NetBus, W32.Beagle.AC, and W32.Netsky should have trig- 
gered alerts. They all write to the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\ 
CurrentVersion\Run, which is read by Windows during its boot. This modified key allows the malware 
to survive a system reboot, because the system will automatically restart all the programs listed in this reg- 
istry key. These malware programs violate our integrity model, because they write untrusted data into this 
registry key, and the system will read these untrusted data during the next reboot. Our framework would 
detect this violation, if our monitor driver is loaded early in the boot sequence. Even in this case our tool 
will alert the user only once, which we still consider to have good usability. 

4.3 Performance 

Our tool monitors the execution of all the processes on the system. It intercepts and optionally logs all 
the system services from untrusted processes, and it also monitors trusted processes to prevent them from 
reading untrusted data. However, most system service calls pass through our monitor very fast, and only the 
calls that modify the system state (such as the file system and the registry) may notice delays. 

The timings in Table 3 reveals that while the overhead of our implementation does increase execution 
time for the tasks, the effect is reasonable when compared with the resource usage of a commercial anti- 
spyware of anti-virus program. Moreover, the performance numbers for unzip should be interpreted as a 
stress test of our system since it mainly consists of file operations. All measurements were conducted on 
an Intel Pentium 4 2GHz desktop with 256 MB RAM and a 7200rpm IDE hard disk running Windows XP 
Workstation SP1. 
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Program 


CPU Time 


Log Size 


Not monitored 


Monitored as trusted 


Monitored as untrusted 


eZula installer 


3.953s 


4.516s 


6.338s 


4959 KB 


Kazaa installer 


48.965s 


59.824s 


101.466s 


12552 KB 


Happy99.Worm 


4.858s 


4.963s 


4.937s 


6 KB 


unzip (5MB file) 


0.535s 


0.666s 


1.013s 


336 KB 



Table 3: CPU time and disk space overhead of our tool while running benign and malware programs. The 
installers and unzip program shall be considered stress tests for our tool since they primarily read and write 
files. 

5 Discussions 
5.1 Security 

5.1.1 Security of the Framework 

Security Goals Security has three main goals: confidentiality, integrity, and availability [3]. Our frame- 
work focuses on maintaining integrity: It allows the user to run untrusted programs without compromising 
system integrity, and it can remove the untrusted programs and all their effects on the system completely 
and automatically. Our framework does not ensure availability directly, because it does not control resource 
usage by untrusted programs; however, since our framework can remove untrusted programs and their ef- 
fects on the system, it provides availability indirectly. Our framework does not provide confidentiality, since 
it does not prevent untrusted programs from reading confidential information, nor does it monitor outgoing 
network traffic. As we discussed in Section 1, once the user starts to run untrusted programs, it is very 
difficult to maintain confidentiality in a usable way. However, we can enhance our framework to provide 
confidentiality. If the user can indicate what information is confidential on his system, we can incorporate 
this information by disallowing untrusted applications from reading confidential information. Our frame- 
work and implementation, however, do not currently support this capability, which we leave for future work. 

Security of the Logging Mechanism Our framework logs write operations by untrusted processes so that 
it can reverse these operations in the future. An adversarial untrusted process may try to DOS attack our 
logging system by making numerous write operations. However, our system does not log each write opera- 
tion by untrusted processes; it only logs those write operations that replace trusted data. More specifically, 
we divide the write operations by untrusted processes into three categories: 

• The operation replaces trusted data. Our system logs this operation and the old data. 

• The operation replaces untrusted data (i.e., data written earlier by an untrusted process). Our system 
logs nothing. 

• The operation writes new data. Our system logs only this operation, because there is no old data. 

The log size in the first case may be large because of potentially large old data, the log size in the third case 
is small, and the log size for the second case is zero. Therefore, an adversarial untrusted process cannot 
effectively DOS attack our logging system by writing large amount of new data, or repeatedly overwriting 
the same location. The only effective attack is to overwrite large amount of trusted data, which we can deal 
with by limiting the maximum amount of data that an untrusted process may overwrite. 
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Security of the Dichotomy of Trustworthiness We assume that once the user considers a process trusted, 
it remains trusted until the user explicitly reclassifies it as untrusted. This ignores the possibility that a trusted 
but vulnerable process may become untrusted because malicious code has been injected into it. 

5.1.2 Security of the Implementation 

We discuss the security of our prototype implemented on Windows: 

• Read and write operations: We need to decide what operations are read and write as defined in our 
framework. Our current prototype only considers read and write operations on the Windows registry 
and on the file system. It considers IPCs, such as sending Windows messages to other processes, as 
combined read and write operations. Therefore, when an untrusted program sends a message to a 
trusted program, this message passing violates the integrity model. Since we cannot easily monitor 
device drivers installed by untrusted programs, we consider their installation as violating the integrity 
property. We also assume, optimistically, that after an untrusted program writes data to the network, 
the data will not be read by some trusted programs from the network later; therefore, we do not 
monitor read or write operation on the network. 

• Security of the monitoring mechanism: The principle of complete mediation requires that untrusted 
programs should be unable to attack or circumvent the monitoring mechanism [19]. We install our 
monitor as a kernel driver before we run untrusted programs. Therefore, our monitor can intercept and 
control all the API calls made by untrusted programs from the user space. Since our prototype can 
prohibit untrusted programs from installing kernel drivers by observing certain system calls, untrusted 
code cannot run in the kernel space. Our prototype treats all the processes spawned by untrusted 
processes as untrusted and transitively monitors the spawned processes. Therefore, we believe that 
our monitor is secure from tampering or circumvention. 

• Security of the logging mechanism: We have discussed the security of the logging mechanism of the 
framework in Section 5.1.1. In the implementation, we need to ensure that no untrusted process can 
tamper with the logs. Since our framework hooks into all the API calls that access the file system, it 
protects the logs by denying access to it from all except the logging process. 

• Security of the recovery mechanism: Since our framework maintains system integrity, recovery can 
always succeed. In particular, before the monitor begins recovery, it aborts the untrusted process 
(and any process spawned by it). Therefore, the process cannot interfere directly with the recovery 
mechanism. 

5.2 Correctness 

Within our framework, the correctness of an implementation relies on our ability to capture accurately 
the semantics of the interactions between a process and the operating system. For example, the monitor 
must understand the precise effects of each system call. If a system call has unknown side effects, we may 
not ensure the correctness of our recovery mechanism. In general, we believe that we can interpret such 
semantics correctly through a careful implementation. 
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6 Related Work 



6.1 Safe Execution Environments 

Perhaps the most closely related work to ours is SEE [21, 13], which proposed the idea of using one-way 
isolation to create a safe execution environment. It creates a temporary file system for storing data written by 
untrusted programs. When an untrusted program opens a file for writing for the first time, SEE copies the file 
to the temporary file system and redirects all subsequent accesses to the file by untrusted programs to the file 
in the temporary file system. When the untrusted program finishes, the user decides whether to commit the 
data in the temporary file system back into the original file system. We view SEE as a dual to our approach: 
SEE allows all the untrusted programs to run through completion, but it may not be able to commit data 
written by certain untrusted programs back to the original file system. On the other hand, our framework 
allows untrusted programs to write data into the file system immediately (we do not keep a temporary file 
system), but our framework may prohibit some untrusted programs from running through completion. From 
a more practical perspective, SEE implemented its approach on Linux and tested it on trusted programs, 
while we implemented our framework on Windows and tested it on real malware programs. Because most 
malware programs target Windows, our prototype is a more convincing demonstration of this framework's 
effectiveness in containing malware. 

Our work is also related to the general isolation strategy with virtual machines, which provide an ef- 
fective, reliable mechanism for isolating untrusted applications. There are two types of virtual machines. 
Type 1 virtual machine monitors run on the hardware directly. They achieve high performance but provide 
less isolation and security. Type 2 virtual machine monitors run on the operating system, which offers ex- 
cellent isolation but suffers from poor performance. King et al. added support for Type 2 virtual machines 
monitors into the Linux kernel for achieving high performance [12]. However, using virtual machines to 
execute untrusted programs has its shortcomings. First, untrusted programs running inside a virtual machine 
cannot access resources created by programs running outside the virtual machine, which may break many 
programs. Second, virtual machines are expensive. To run each untrusted program, one needs to set up 
a new virtual machine and install a complete operating system inside each one, which takes considerable 
human time and system resources. 

6.2 Logging and Recovery 

Our framework is inspired by recovery-oriented computing (ROC), which is a framework for recovering 
from system component failure and operator errors [17, 4]. It contains three stages: rewind, repair, and 
replay. Its threat model is that any component in the system may fail, and that the operator may make a 
mistake at any time. Since our goal is to run untrusted programs safely, we need a different threat model: we 
assume that most applications on the system are trustworthy, so we can focus on monitoring and logging a 
few untrusted applications. Therefore, our framework has a much smaller overhead for logging and recovery. 
Our framework also avoids possibly expensive snapshots required in ROC. 

Logging has been used for replaying system events. ReVirt uses logging for intrusion detection. It runs 
applications inside a virtual machine and logs their events. Then, it analyzes intrusions by replaying the 
logged events [7]. King et al. uses logging for debugging operating systems [11]. They run an operating 
system inside a virtual machine, log all its events, and use the logs to debug the operating system. We 
use logging for a different purpose: we want to recover the system to a safe state, rather than replay the 
events. This difference requires that we design our logging system differently. If logging is for replaying 
all the events, one needs to take a snapshot of the system and log all the events afterward. During replay, 
he sets the system state to the snapshot and replays all the events. In contrast, if logging is for recovering 
the system to a safe state, we do not need to take a snapshot of the system; we just need to log all the 
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events. During recovery, we start from the current state of the system and undo each offending event in the 
reverse chronological order. In this approach, we have avoided taking a system snapshot, which may be very 
expensive. Logging has also been used for system recovery. A log-structured file system [18] takes this idea 
even further: The entire file system is in a log-like structure, which speeds up both file writing and crash 
recovery. It influenced the design of file system recovery in our framework. 

In Windows, a user can take a snapshot of the registry, and restore the registry to the snapshot later. The 
user, however, cannot take a snapshot and restore the file system. Our framework logs sufficient information 
for recovering all the reversible system resources, such as the registry and the file system. A more prominent 
difference is that Window's snapshot-based approach is too coarse-grained, because it rolls back the effects 
of all the applications that run after the snapshot. However, a user typically runs trusted and untrusted pro- 
grams in an interleaved fashion, therefore he does not want to roll back the effects of the trusted applications. 
Our framework offers a fine-grained solution: It can roll back each individual application without affecting 
other applications. 

Reparable file service (RFS) [23] uses a similar idea of logging and recovery to repair compromised 
network file servers, such as NFS servers. It interposes a RFS server between the NFS server and clients 
for logging file update operations, and these logs can be used later for rolling back these operations. It 
is used for a different purpose from that of our approach, and as such, it is more complicated: It requires 
modifying all the NFS clients as well as interposing the RFS server between the NFS server and its clients. 
Moreover, it logs all the write operations by untrusted processes. Because of this, an adversarial process can 
DOS attack the system by repeatedly rewriting the same data location many times to cause numerous logs. 
In comparison, our framework will only log the first such write operation by the adversarial process, and 
therefore is immune from such a DOS attack. 

Our logging mechanism employs a simple tainting analysis to track trustworthiness of data. Similar 
ideas have been used for many other purposes. Chow et al. proposed to use whole-system simulation with 
tainting analysis to analyze how sensitive data are handled in large programs, such as Apache and Emacs [5]. 
Newsome et al. used dynamic taint analysis for automatic detection of overwrite attacks in processes. Back- 
Tracker [10] identified automatically potential sequences of steps that occurred in an intrusion. Starting 
with a single detection point, it identified files and processes that could have affected that detection point. 
In comparison, our framework tracks the propagation of untrusted data for preserving system integrity and 
removing malware. 

6.3 Malware Detection and Recovery 

Security mechanisms have three goals: prevention, detection, and recovery [3]. Malware is hard to de- 
tect, because it is difficult to decide if a software program is malicious. One might statically analyze the 
program to search for known patterns of malware, but this approach suffers from two inherent limitations: 
(1) It cannot detect new malware; and (2) sophisticated malware writers can craft polymorphic malware 
to escape signature-based detection. Although one can derive more robust signatures for detecting certain 
polymorphic malware, this is invariably an arms race between the malware writers and the defenders [6]. Al- 
ternatively, one can monitor malware at runtime and watch out for malicious behaviors. Since this approach 
needs runtime signatures of malware, it suffers from similar limitations as static detection. Recovery-based 
approaches overcome these limitations because they require neither static nor runtime malware signatures. 
They monitor an untrusted program's execution, log the program's operations, and recover the system by 
reversing these operations using their logs. 

There are many commercial solutions for protection from malware. They focus on signature-based 
detection and recovery. While such solutions for worms and viruses have been around for a while, anti- 
spyware tools have only recently become wide-spread. Now, there are literally hundreds of tools available 
on the Internet [9]. Unfortunately, many of these tools install spy ware themselves. Howes has conducted an 
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in-depth survey of the more legitimate anti-spyware tools [9]. He evaluated these tools against a test set of 
malware programs, and found that no tool could find and recover from the entire test set. In fact, even the 
best tools missed a quarter of the files installed by the spyware applications. Like all signature-based tools, 
these current commercial applications lack the ability to detect unknown malware. 

HijackThis is a freeware tool for helping diagnose malware infections [8]. It inspects a system's settings 
in order to help determine what malware applications are resident on it. It enumerates registry keys, config- 
uration files, and browser helper objects, among others. Through visual inspection the user can determine 
if these settings are legitimate or if they are due to malware. This approach has limitations and the tool is 
mostly useful for helping assess current system state. 

7 Conclusions and Future Work 

We have described Back to the Future, a novel framework for automatically removing malware and 
reparing its damage to the system. The framework preserves system integrity while the user is running 
untrusted programs, and allows untrusted programs to run as long as possible until they may harm trusted 
programs. The framework achieves these goals by monitoring untrusted programs, logging their operations, 
and using the logs to remove malware and to restore infected data. We implemented this framework on 
Windows and tested our prototype on real spyware, adware, Trojan horses, and email worms. Comparing our 
tool with two popular commercial anti-malware tools, we found that our tool could detect all the malware's 
modifications to the system that were detected by the commercial tools, but the commercial tools overlooked 
up to 97% of the modifications that were detected by our tool. The CPU and storage overhead caused by 
our tool is acceptable. 

There are interesting directions for future work. Our framework currently divides all the applications 
into two categories: trusted and untrusted. It does not track dataflow among different untrusted programs. 
Therefore, once the user decides to remove one untrusted program, all the other untrusted programs must 
be removed as well. Under certain situations, however, the user may desire a finer-grained containment 
mechanism, where the framework would place untrusted applications into different categories so that it may 
remove all the programs in one category without having to remove any programs in the other categories. 
We are working on extending our framework to achieve this goal. In addition, although we have focused 
on recovery in this paper, the framework also provides a foundation for developing other defenses. We are 
studying how to use the logs generated by the framework for analyzing and detecting malware. 
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